-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Issue Deduplication #11
feat: Issue Deduplication #11
Conversation
Unused types (1)
|
@0x4007 I have tried to make a few examples, let me know if have more or want to test more. Screen.Recording.2024-09-13.at.7.08.57.PM-1.mov |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Adding labels is out of scope. Don't do that. Close it as unplanned, don't add any labels.
- Add a match percentage as well when any are listed.
- How did you generate the test cases and determine their percentage similarity?
src/handlers/issue-deduplication.ts
Outdated
import { IssueSimilaritySearchResult } from "../adapters/supabase/helpers/issues"; | ||
import { Context } from "../types"; | ||
const MATCH_THRESHOLD = 0.95; | ||
const WARNING_THRESHOLD = 0.5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you do 50%?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cosine similarity of 0.75 appears quite close for identifying similar issues. I tested this with a few examples and noticed some potential errors with the samples. Typically, for similar issues, the similarity was either above 75% and aligned with 95% category or around 60%. Therefore, I experimented with a 50% threshold, which seemed to work well.
Added They will display the cosine similarity in percentage after each issue in the list.
I manually calculated and created test cases using embeddings and found their cosine similarity values. |
Can you link your issue where you tested so we can see the results? |
95%: 50%: I have deployed the plugin at Plugin Link, if you wish to try it. The issues test values Link |
Okay it seems like you aren't following the spec again. Needs to list the similar results on every scenario. Do 75% and 95% as a default. |
Fixed that, it now returns the similar issue in both
Warning Threshold is 75% now. 95%: 75%: |
Doesn't look like it in the first one |
That's the first issue of that type, so its expected to not have similar issues. Two, should be the first time a similar issue, is found with similarity more than 95%. So, the first issue would not satisfy the any of the match conditions. The third issue does not have any similar issues to that, so it wouldn't have any message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool just needs configuration and I can merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming it all works. Code looks good.
Resolves #6